Non-Overlapping Hierarchical Index Structure for Similarity Search
نویسندگان
چکیده
In order to accelerate the similarity search in highdimensional database, we propose a new hierarchical indexing method. It is composed of offline and online phases. Our contribution concerns both phases. In the offline phase, after gathering the whole of the data in clusters and constructing a hierarchical index, the main originality of our contribution consists to develop a method to construct bounding forms of clusters to avoid overlapping. For the online phase, our idea improves considerably performances of similarity search. However, for this second phase, we have also developed an adapted search algorithm. Our method baptized NOHIS (Non-Overlapping Hierarchical Index Structure) use the Principal Direction Divisive Partitioning (PDDP) as algorithm of clustering. The principle of the PDDP is to divide data recursively into two sub-clusters; division is done by using the hyper-plane orthogonal to the principal direction derived from the covariance matrix and passing through the centroid of the cluster to divide. Data of each two sub-clusters obtained are including by a minimum bounding rectangle (MBR). The two MBRs are directed according to the principal direction. Consequently, the nonoverlapping between the two forms is assured. Experiments use databases containing image descriptors. Results show that the proposed method outperforms sequential scan and SRtree in processing k-nearest neighbors. Keywords—K-Nearest Neighbor Search, Multidimensional Indexing, Multimedia Databases, Similarity Search.
منابع مشابه
NOHIS-Tree: High-Dimensional Index Structure for Similarity Search
In Content-Based Image Retrieval systems it is important to use an efficient indexing technique in order to perform and accelerate the search in huge databases. The used indexing technique should also support the high dimensions of image features. In this paper we present the hierarchical index NOHIS-tree (Non Overlapping Hierarchical Index Structure) when we scale up to very large databases. W...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملHierarchical Overlapping Clustering of Network Data Using Cut Metrics
A novel method to obtain hierarchical and overlapping clusters from network data – i.e., a set of nodes endowed with pairwise dissimilarities – is presented. The introduced method is hierarchical in the sense that it outputs a nested collection of groupings of the node set depending on the resolution or degree of similarity desired, and it is overlapping since it allows nodes to belong to more ...
متن کاملDAHC-tree: An Effective Index for Approximate Search in High-Dimensional Metric Spaces
Similarity search in high-dimensional metric spaces is a key operation in many applications, such as multimedia databases, image retrieval, object recognition, and others. The high dimensionality of the data requires special index structures to facilitate the search. A problem regarding the creation of suitable index structures for highdimensional data is the relationship between the geometry o...
متن کاملA Hierarchical Bitmap Indexing Method for Similarity Search in High-Dimensional Multimedia Databases
This paper proposes an efficient indexing mechanism for similarity search in highdimensional multimedia database that quickly filter-outs the irrelevant objects using a novel indexing structure, called HBI (Hierarchical Bitmap Index). In this bitmap index, the feature (or attribute) value of object at each dimension is represented with a set of two bits each of which indicates whether it is rel...
متن کامل